Skip to content

Conversation

@nutanix-Hrushikesh
Copy link
Contributor

@nutanix-Hrushikesh nutanix-Hrushikesh commented Oct 6, 2025

Description
This PR adds complete support for OpenAI’s image generation endpoint (/v1/images/generations) across the Envoy AI Gateway. It introduces a processor, translation layer, tracing and metrics instrumentation, Brotli decoding, example client/service updates, and repo hygiene improvements.

Changes

  • ExtProc (image generation)

    • Added imageGenerationProcessorFactory and registered it in ExtProc main.
    • Implemented imageGenerationProcessorRouterFilter and imageGenerationProcessorUpstreamFilter.
    • Added request/response header and body processing, including retry handling and auth passthrough.
    • Introduced imageGenerationMetrics and integrated with the processor to record image-specific telemetry.
    • Improved diagnostics with debug logs for processor selection/instantiation when matching routes.
  • Translator (OpenAI to OpenAI)

    • Added ImageGenerationTranslator interface and ImageGenerationMetadata to standardize request/response translation for images.
    • Implemented OpenAI to OpenAI translator:
      • Request body transformation (model overrides, forced mutation).
      • Response headers/body parsing with OpenAI SDK v2 schema to avoid drift.
  • Tracing

    • Extended Tracing API to support image generation with ImageGenerationTracer and ImageGenerationRecorder.
    • Implemented imageGenerationTracer and imageGenerationSpan:
      • Span start, header injection, response recording, and well-defined error termination paths.
    • Added OpenInference-based ImageGenerationRecorder for router filter instrumentation and a Noop tracer.
  • Metrics

    • Implemented ImageGenerationMetrics with methods to record lifecycle events, model/backend selection, token usage, and image generation stats.
    • Extended GenAI metrics with image-specific attributes:
      • genaiAttributeImageCount, genaiAttributeImageModel, genaiAttributeImageSize
    • Added operation type: genaiOperationImageGeneration.
  • Utilities

    • decodeContentIfNeeded to support brotli encoding alongside gzip for modern upstreams.
  • CLI, docs, and examples

    • cmd/aigw/docker-compose.yaml: added image-generation service (curl-based client) modeled after chat/embeddings.
    • cmd/aigw/README.md: added image generation usage (service and curl examples) and embeddings “create-embeddings” instructions; updated OTEL section to include image-generation.
    • OTLP/OTEL Compose flow updated to demonstrate image generation alongside chat and embeddings.
  • Tests and config

    • Added /v1/images/generations route in tests/extproc/vcr/envoy.yaml.
    • test coverage for ExtProc, translator, metrics, and tracing (details below).
  • Repo hygiene

    • .gitignore: ignore tests/e2e-inference-extension/logs/ to prevent accidental log check-ins.

Bug Fixes / Improvements

  • Error handling: Standardized Images API error parsing (including non-JSON upstream errors) via ImageGenerationError.
  • Observability: Full tracing and metrics for image generation requests; span/metric attributes include image count, model, and size.
  • Compatibility/Performance: Brotli decoding support for modern content encodings; improved debug logging around processor selection and instantiation.

Tests

  • Unit tests
    • ExtProc image generation processor: supported routes, upstream scenarios, request/response handling, translator selection by API schema, retry behavior.
    • Translator (OpenAI -> OpenAI): request body transformations (model override, forced mutation), non-JSON error mapping, successful response parsing.
    • Metrics: token usage recording, image generation counters/histograms, header label mapping.
    • Tracing: image tracer/span behavior for basic and multi-image requests, and with pre-existing trace context.
    • OpenInference recorder: attribute construction and hooks for requests, responses, and errors.

Dependencies / Migrations

  • Added
    • github.com/andybalholm/brotli v1.2.0 (Brotli decoding).
    • github.com/xyproto/randomstring v1.0.5 (utility for upcoming features/tests).
  • Existing usage formalized
    • github.com/openai/openai-go/v2 leveraged for schema-safe decoding and tracing types.

Notes for Reviewers

  • Processor & filters: Validate routing and upstream filter logic for /v1/images/generations, including retry and header/body mutation behavior.
  • Translator: Check non-JSON error mapping and OpenAI SDK v2 schema usage to prevent drift; confirm response decoding reliability.
  • Tracing: Verify header injection, span names, and attribute coverage (count/model/size). Confirm OpenInference semantics and Noop behavior are preserved.
  • Metrics: Confirm attribute names, cardinality, and consistency with existing GenAI metrics; review token/image recording correctness.
  • Utilities: Ensure Brotli decode path coexists safely with gzip.
  • Docs & examples: Run image-generation and create-embeddings services to validate examples; confirm README clarity.
  • Repo hygiene: Validate .gitignore path excludes e2e inference extension logs as intended.

@missBerg
Copy link
Contributor

missBerg commented Oct 6, 2025

@nutanix-Hrushikesh some admin with this PR, the DCO and PR style needs to be fixed for this PR

@codefromthecrypt
Copy link
Contributor

to make this complete I would recommend following the pattern in testopenai (add a cassette for the new request handed and record it) then record a new span json in testopeninference (this ensures the data we capture is actually per impl and not accidentally different). Both have README, but let me know if any of it is unclear.

}

// Debug details about the processor selection.
if s.logger != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill remove some extra logging

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove this entire block

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nutanix-Hrushikesh nutanix-Hrushikesh changed the title End-to-end OpenAI Image Generation support: processor, translator, tracing, metrics aigw: End-to-end OpenAI Image Generation support: processor, translator, tracing, metrics Oct 7, 2025
@nutanix-Hrushikesh nutanix-Hrushikesh changed the title aigw: End-to-end OpenAI Image Generation support: processor, translator, tracing, metrics aigw: end-to-end OpenAI Image Generation support: processor, translator, tracing, metrics Oct 7, 2025
@mathetake
Copy link
Member

@nutanix-Hrushikesh do you want to make it more complete by following @codefromthecrypt comment?

to make this complete I would recommend following the pattern in testopenai (add a cassette for the new request handed and record it) then record a new span json in testopeninference (this ensures the data we capture is actually per impl and not accidentally different). Both have README, but let me know if any of it is unclear.

@mathetake mathetake changed the title aigw: end-to-end OpenAI Image Generation support: processor, translator, tracing, metrics feat: end-to-end OpenAI Image Generation support: processor, translator, tracing, metrics Oct 7, 2025
@mathetake mathetake changed the title feat: end-to-end OpenAI Image Generation support: processor, translator, tracing, metrics feat: end-to-end OpenAI Image Generation support Oct 7, 2025
@nutanix-Hrushikesh nutanix-Hrushikesh force-pushed the image-generation branch 2 times, most recently from 390e4d0 to 7907692 Compare October 8, 2025 15:42
@nutanix-Hrushikesh
Copy link
Contributor Author

nutanix-Hrushikesh commented Oct 8, 2025

to make this complete I would recommend following the pattern in testopenai (add a cassette for the new request handed and record it) then record a new span json in testopeninference (this ensures the data we capture is actually per impl and not accidentally different). Both have README, but let me know if any of it is unclear.

I have recorded cassette and span, Please let me if this is expected
@codefromthecrypt

Copy link
Contributor

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting farther! A couple notes

// ModelTextEmbedding3Small is the cheapest model usable with /embeddings.
ModelTextEmbedding3Small = "text-embedding-3-small"

// ModelDALLE2 is the DALL-E 2 model usable with /v1/images/generations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the rationale for entry here was the cheapest model that can be used to produce test data. I would look up which if either or a different one is cheapest. Only add one and document like the others if it is the cheapest.

Similarly for when you make the cassette you will notice in the existing text-to-speach image-to-text etc requests that they use the cheapest possible request in terms of cost and size. You cam ask AI to help you figure out that. For example I used grok to figure out a cheap audio request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used smaller model gpt-image-1-mini,
Also there is intermittent issue in test, cassette file is not saved sometimes where server is closed before file is saved to disk. Added temporary fix to sleep 5 second before closing.

@nutanix-Hrushikesh nutanix-Hrushikesh force-pushed the image-generation branch 2 times, most recently from dc13e02 to dee84d4 Compare October 10, 2025 12:22
@mathetake
Copy link
Member

will review next week 🙏 sorry for the delay

Copy link
Member

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @nutanix-Hrushikesh, i left a few minor comments. From now, could you avoid force pushing per CONTRIBUTING.md for easy review. Also make sure that you remove any debugging lines

.env.ollama Outdated
THINKING_MODEL=qwen3:1.7b
COMPLETION_MODEL=qwen2.5:0.5b
EMBEDDINGS_MODEL=all-minilm:33m
IMAGE_GENERATION_MODEL=dall-e-2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this Ollama?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this wont work, but there is no image gen model available with ollma


// Extract image generation metadata for metrics (model may be absent in SDK response)
imageMetadata.ImageCount = len(resp.Data)
imageMetadata.Model = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like other endpoints, we should assume that the requested model == response model if the response lacks the model name. so could you apply the patch like this?

diff --git a/internal/extproc/translator/imagegeneration_openai_openai.go b/internal/extproc/translator/imagegeneration_openai_openai.go
index 9e5d74ba..d5ab5ff5 100644
--- a/internal/extproc/translator/imagegeneration_openai_openai.go
+++ b/internal/extproc/translator/imagegeneration_openai_openai.go
@@ -6,6 +6,7 @@
 package translator
 
 import (
+       "cmp"
        "encoding/json"
        "fmt"
        "io"
@@ -32,11 +33,12 @@ type openAIToOpenAIImageGenerationTranslator struct {
        // The path of the images generations endpoint to be used for the request. It is prefixed with the OpenAI path prefix.
        path string
        // span is the tracing span for this request, inherited from the router filter.
-       span tracing.ImageGenerationSpan
+       span         tracing.ImageGenerationSpan
+       requestModel internalapi.RequestModel
 }
 
 // RequestBody implements [ImageGenerationTranslator.RequestBody].
-func (o *openAIToOpenAIImageGenerationTranslator) RequestBody(original []byte, _ *openaisdk.ImageGenerateParams, forceBodyMutation bool) (
+func (o *openAIToOpenAIImageGenerationTranslator) RequestBody(original []byte, p *openaisdk.ImageGenerateParams, forceBodyMutation bool) (
        headerMutation *extprocv3.HeaderMutation, bodyMutation *extprocv3.BodyMutation, err error,
 ) {
        var newBody []byte
@@ -47,6 +49,7 @@ func (o *openAIToOpenAIImageGenerationTranslator) RequestBody(original []byte, _
                        return nil, nil, fmt.Errorf("failed to set model name: %w", err)
                }
        }
+       o.requestModel = cmp.Or(o.modelNameOverride, p.Model)
 
        // Always set the path header to the images generations endpoint so that the request is routed correctly.
        headerMutation = &extprocv3.HeaderMutation{
@@ -144,9 +147,9 @@ func (o *openAIToOpenAIImageGenerationTranslator) ResponseBody(_ map[string]stri
                tokenUsage.TotalTokens = uint32(resp.Usage.TotalTokens)   //nolint:gosec
        }
 
-       // Extract image generation metadata for metrics (model may be absent in SDK response)
+       // Extract image generation metadata for metrics.
        imageMetadata.ImageCount = len(resp.Data)
-       imageMetadata.Model = ""
+       imageMetadata.Model = o.requestModel // Model is not present in the response, so we assume the request model == response model.
        imageMetadata.Size = string(resp.Size)
 
        return

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nutanix-Hrushikesh this is unresolved yet

Copy link
Member

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codecov-commenter
Copy link

codecov-commenter commented Oct 15, 2025

Codecov Report

❌ Patch coverage is 89.64646% with 41 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.60%. Comparing base (80838bc) to head (2d25398).
⚠️ Report is 30 commits behind head on main.

Files with missing lines Patch % Lines
internal/extproc/imagegeneration_processor.go 87.72% 15 Missing and 12 partials ⚠️
internal/extproc/util.go 0.00% 6 Missing ⚠️
...xtproc/translator/imagegeneration_openai_openai.go 93.75% 2 Missing and 2 partials ⚠️
internal/metrics/image_generation_metrics.go 92.85% 2 Missing ⚠️
internal/tracing/tracing.go 75.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1280      +/-   ##
==========================================
+ Coverage   78.27%   78.60%   +0.32%     
==========================================
  Files         132      139       +7     
  Lines       13349    13745     +396     
==========================================
+ Hits        10449    10804     +355     
- Misses       2260     2287      +27     
- Partials      640      654      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@nutanix-Hrushikesh nutanix-Hrushikesh force-pushed the image-generation branch 2 times, most recently from 82361c0 to fa27f96 Compare October 16, 2025 14:33
@mathetake
Copy link
Member

@nutanix-Hrushikesh i see some bad merge here, could you clean up the commit here?

Comment on lines 126 to 142
# completion is the standard OpenAI client (`openai` in pip), instrumented
# with the following OpenTelemetry instrumentation libraries:
# - openinference-instrumentation-openai (completions spans)
# - opentelemetry-instrumentation-httpx (HTTP client spans and trace headers)
completion:
build:
context: ../../tests/internal/testopeninference
dockerfile: Dockerfile.openai_client
target: completion
container_name: completion
profiles: ["test"]
env_file:
- ../../.env.ollama
- .env.otel.${COMPOSE_PROFILES:-console}
environment:
- OPENAI_BASE_URL=http://aigw:1975/v1
- OPENAI_API_KEY=unused
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not delete irrelevant thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you not lie ?

# completion is the standard OpenAI client (`openai` in pip), instrumented
# with the following OpenTelemetry instrumentation libraries:
# - openinference-instrumentation-openai (completions spans)
# - opentelemetry-instrumentation-httpx (HTTP client spans and trace headers)
completion:
build:
context: ../../tests/internal/testopeninference
dockerfile: Dockerfile.openai_client
target: completion
container_name: completion
profiles: ["test"]
env_file:
- ../../.env.ollama
- .env.otel.${COMPOSE_PROFILES:-console}
environment:
- OPENAI_BASE_URL=http://aigw:1975/v1
- OPENAI_API_KEY=unused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, i think i was referring old file

// Clear any existing env vars
t.Setenv("OPENAI_API_KEY", "")
t.Setenv("OPENAI_BASE_URL", "")
t.Setenv("AZURE_OPENAI_API_KEY", "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this relevant to this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests were failing locally, to fix that i added this, but ill remove.

{
name: "run no arg",
args: []string{"run"},
env: map[string]string{"OPENAI_API_KEY": "", "AZURE_OPENAI_API_KEY": ""},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Comment on lines 677 to 678
ctx, cancel := context.WithCancel(t.Context()) //nolint: govet
ctx, cancel := context.WithCancel(t.Context())
defer cancel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

irrelevant

Signed-off-by: Hrushikesh Patil <[email protected]>

// ImageGenerationError represents an error response from the OpenAI Images API.
// This schema matches OpenAI's documented error wire format.
type ImageGenerationError struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you can't use openai.Error like other places?

Copy link
Member

@mathetake mathetake Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you revert the unrelated changes like grouping etc as well as bring back the reference url to otel?

The reason is that the "grouping comment" will be considered as a comment for the first entry and not for others. That I feels is confusing so i would rather not do that. Instead, if we really want, we should do the documentation comments on each of the constant instead of partial grouping as in the current state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost there!

Use a different AIGW because Ollama does not currently support the image generation model.
This setup requires the OpenAI API key to be set in environment variables.

This is a temporary workaround until Ollama adds image generation support.

Signed-off-by: Hrushikesh Patil <[email protected]>
Comment on lines 59 to 63
imageInfo: mustRegisterHistogram(
meter,
"ai_gateway.image.generation",
metric.WithDescription("Image generation request marker with image-specific attributes"),
),
Copy link
Member

@mathetake mathetake Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this? can't you just use gen_ai.client.operation.duration or whatever existing well-defined otel metrics and adding additional attributes for images specific stuff? I don't think this additional custom metrics (not even documented in this PR) is necessary. can you remove it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I even think having them all in the attributes is really bad idea as "size" has the infinite cardinality "image count" as well.

// for metrics and observability.
type ImageGenerationMetadata struct {
	// ImageCount is the number of images generated in the response.
	ImageCount int
	// Model is the AI model used for image generation.
	Model string
	// Size is the size/dimensions of the generated images.
	Size string
}

so can you

  • Remove this metrics.
  • Remove ImageGenerationMetadata

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for size, there are only 7-8 sizes option.
image count have infinite cardinality, Should i keep size?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes fine, but please remove this metrics anyways

@mathetake mathetake enabled auto-merge (squash) October 23, 2025 16:54
@mathetake
Copy link
Member

@nutanix-Hrushikesh thank you for the multiple iterations. It's good to see finally this landing!

@mathetake mathetake merged commit fa7749a into envoyproxy:main Oct 23, 2025
30 checks passed
@nutanix-Hrushikesh
Copy link
Contributor Author

t's good to see finally this landing!
Thanks for all the feedback and guidance along the way!

AyushSawant18588 pushed a commit to AyushSawant18588/ai-gateway that referenced this pull request Oct 24, 2025
**Description**
This PR adds complete support for OpenAI’s image generation endpoint
(/v1/images/generations) across the Envoy AI Gateway. It introduces a
processor, translation layer, tracing and metrics instrumentation,
Brotli decoding, example client/service updates, and repo hygiene
improvements.

---------

Signed-off-by: Hrushikesh Patil <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.